[compiler] Redesign Repartition IR nodes to be naive coalesce only #12671

tpoterba · 2023-02-08T15:58:14Z

This changes the execution semantics for the Spark backend. Instead of dispatching either to (a) a two-pass algorithm that scans and coalesces (shuffle=False) or (b) a two-pass algorithm that shuffles and then rekeys (shuffle=True), we use write/read instead.

patrick-schultz · 2023-03-01T13:45:08Z

hail/python/test/hail/matrixtable/test_matrix_table.py

-    # test MatrixRepartition
-    if not hl.current_backend().requires_lowering:
-        rmt = hl.utils.range_matrix_table(20, 10, 3)
-        mt = rmt.repartition(5)
-        assert_contains_node(mt, ir.MatrixRepartition)
-        assert_unique_uids(mt)
-


Can we rewrite this and the TableRepartition one to use naive_coalesce, so the handle_randomness implementations still have test coverage?

get it done Patrick!

tpoterba assigned patrick-schultz Feb 8, 2023

fix MatrixRepartition python ir

eb0ba6c

patrick-schultz previously requested changes Mar 1, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler] Redesign Repartition IR nodes to be naive coalesce only #12671

[compiler] Redesign Repartition IR nodes to be naive coalesce only #12671

tpoterba commented Feb 8, 2023

patrick-schultz Mar 1, 2023

[compiler] Redesign Repartition IR nodes to be naive coalesce only #12671

Are you sure you want to change the base?

[compiler] Redesign Repartition IR nodes to be naive coalesce only #12671

Conversation

tpoterba commented Feb 8, 2023

patrick-schultz Mar 1, 2023

Choose a reason for hiding this comment